Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 1991 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 140.1 KiB |
| Average record size in memory | 72.1 B |
Variable types
| Numeric | 9 |
|---|
do is highly correlated with wqi | High correlation |
tc is highly correlated with wqi | High correlation |
wqi is highly correlated with do and 1 other fields | High correlation |
do is highly correlated with wqi | High correlation |
wqi is highly correlated with do | High correlation |
Unnamed: 0 is highly correlated with station | High correlation |
station is highly correlated with Unnamed: 0 | High correlation |
do is highly correlated with wqi | High correlation |
bod is highly correlated with tc | High correlation |
tc is highly correlated with bod and 1 other fields | High correlation |
wqi is highly correlated with do and 1 other fields | High correlation |
ph is highly skewed (γ1 = 27.34385201) | Skewed |
tc is highly skewed (γ1 = 31.71407152) | Skewed |
Unnamed: 0 is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
na has 41 (2.1%) zeros | Zeros |
Reproduction
| Analysis started | 2022-01-23 17:02:31.334425 |
|---|---|
| Analysis finished | 2022-01-23 17:03:08.153828 |
| Duration | 36.82 seconds |
| Software version | pandas-profiling v3.1.1 |
| Download configuration | config.json |
| Distinct | 1991 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 995 |
| Minimum | 0 |
|---|---|
| Maximum | 1990 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 99.5 |
| Q1 | 497.5 |
| median | 995 |
| Q3 | 1492.5 |
| 95-th percentile | 1890.5 |
| Maximum | 1990 |
| Range | 1990 |
| Interquartile range (IQR) | 995 |
Descriptive statistics
| Standard deviation | 574.8965124 |
|---|---|
| Coefficient of variation (CV) | 0.5777854396 |
| Kurtosis | -1.2 |
| Mean | 995 |
| Median Absolute Deviation (MAD) | 498 |
| Skewness | 0 |
| Sum | 1981045 |
| Variance | 330506 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1989 | 1 | 0.1% |
| 1334 | 1 | 0.1% |
| 1308 | 1 | 0.1% |
| 1310 | 1 | 0.1% |
| 1312 | 1 | 0.1% |
| 1314 | 1 | 0.1% |
| 1316 | 1 | 0.1% |
| 1318 | 1 | 0.1% |
| 1320 | 1 | 0.1% |
| 1322 | 1 | 0.1% |
| Other values (1981) | 1981 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 1990 | 1 | |
| 1989 | 1 | |
| 1988 | 1 | |
| 1987 | 1 | |
| 1986 | 1 | |
| 1985 | 1 | |
| 1984 | 1 | |
| 1983 | 1 | |
| 1982 | 1 | |
| 1981 | 1 |
| Distinct | 320 |
|---|---|
| Distinct (%) | 16.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1947.967855 |
| Minimum | 2 |
|---|---|
| Maximum | 3473 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 1092 |
| Q1 | 1456 |
| median | 1861 |
| Q3 | 2336.5 |
| 95-th percentile | 3360 |
| Maximum | 3473 |
| Range | 3471 |
| Interquartile range (IQR) | 880.5 |
Descriptive statistics
| Standard deviation | 722.1032617 |
|---|---|
| Coefficient of variation (CV) | 0.3706956764 |
| Kurtosis | 0.2392620535 |
| Mean | 1947.967855 |
| Median Absolute Deviation (MAD) | 437 |
| Skewness | 0.03799323932 |
| Sum | 3878404 |
| Variance | 521433.1206 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1861 | 129 | 6.5% |
| 1400 | 10 | 0.5% |
| 1399 | 10 | 0.5% |
| 1572 | 10 | 0.5% |
| 1547 | 10 | 0.5% |
| 1094 | 10 | 0.5% |
| 1570 | 10 | 0.5% |
| 1151 | 10 | 0.5% |
| 1573 | 10 | 0.5% |
| 1642 | 10 | 0.5% |
| Other values (310) | 1772 |
| Value | Count | Frequency (%) |
| 2 | 1 | 0.1% |
| 17 | 10 | |
| 18 | 10 | |
| 20 | 10 | |
| 21 | 10 | |
| 42 | 10 | |
| 43 | 10 | |
| 1023 | 9 | |
| 1024 | 9 | |
| 1025 | 9 |
| Value | Count | Frequency (%) |
| 3473 | 3 | |
| 3471 | 3 | |
| 3468 | 3 | |
| 3466 | 3 | |
| 3465 | 3 | |
| 3464 | 3 | |
| 3460 | 3 | |
| 3459 | 3 | |
| 3458 | 3 | |
| 3384 | 3 |
| Distinct | 165 |
|---|---|
| Distinct (%) | 8.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.397422401 |
| Minimum | 0 |
|---|---|
| Maximum | 11.4 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.9 |
| Q1 | 5.95 |
| median | 6.7 |
| Q3 | 7.2 |
| 95-th percentile | 8 |
| Maximum | 11.4 |
| Range | 11.4 |
| Interquartile range (IQR) | 1.25 |
Descriptive statistics
| Standard deviation | 1.323062347 |
|---|---|
| Coefficient of variation (CV) | 0.2068117851 |
| Kurtosis | 3.847240171 |
| Mean | 6.397422401 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | -1.468481491 |
| Sum | 12737.268 |
| Variance | 1.750493974 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.7 | 131 | 6.6% |
| 6.8 | 118 | 5.9% |
| 6.9 | 103 | 5.2% |
| 7 | 98 | 4.9% |
| 6.6 | 91 | 4.6% |
| 7.2 | 87 | 4.4% |
| 7.1 | 82 | 4.1% |
| 7.3 | 79 | 4.0% |
| 6.5 | 64 | 3.2% |
| 6.4 | 63 | 3.2% |
| Other values (155) | 1075 |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 0.2 | 1 | 0.1% |
| 0.5 | 1 | 0.1% |
| 0.6 | 4 | |
| 0.7 | 2 | |
| 0.8 | 2 | |
| 0.9 | 2 | |
| 1 | 2 | |
| 1.1 | 1 | 0.1% |
| 1.2 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 11.4 | 1 | |
| 11.1 | 1 | |
| 10 | 2 | |
| 9.9 | 2 | |
| 9.8 | 1 | |
| 9.6 | 2 | |
| 9.4 | 1 | |
| 9.3 | 1 | |
| 9.2 | 1 | |
| 9.1 | 1 |
| Distinct | 265 |
|---|---|
| Distinct (%) | 13.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 111.6696168 |
| Minimum | 0 |
|---|---|
| Maximum | 67115 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6.4 |
| Q1 | 6.9 |
| median | 7.3 |
| Q3 | 7.7 |
| 95-th percentile | 8.4 |
| Maximum | 67115 |
| Range | 67115 |
| Interquartile range (IQR) | 0.8 |
Descriptive statistics
| Standard deviation | 1875.161891 |
|---|---|
| Coefficient of variation (CV) | 16.79205092 |
| Kurtosis | 876.0255587 |
| Mean | 111.6696168 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 27.34385201 |
| Sum | 222334.207 |
| Variance | 3516232.118 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.2 | 138 | 6.9% |
| 7.3 | 134 | 6.7% |
| 7.4 | 127 | 6.4% |
| 7.1 | 118 | 5.9% |
| 7 | 112 | 5.6% |
| 7.6 | 110 | 5.5% |
| 6.9 | 102 | 5.1% |
| 7.7 | 97 | 4.9% |
| 7.8 | 96 | 4.8% |
| 7.5 | 91 | 4.6% |
| Other values (255) | 866 |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 2.6 | 1 | |
| 2.7 | 2 | |
| 2.9 | 2 | |
| 3 | 1 | |
| 3.05 | 1 | |
| 3.1 | 2 | |
| 3.2 | 1 | |
| 3.3 | 1 | |
| 5.27 | 1 |
| Value | Count | Frequency (%) |
| 67115 | 1 | |
| 28598 | 1 | |
| 24336 | 1 | |
| 21331 | 1 | |
| 20850 | 1 | |
| 9948 | 1 | |
| 9416 | 1 | |
| 3384 | 1 | |
| 1835 | 1 | |
| 1708 | 1 |
co
Real number (ℝ≥0)
| Distinct | 1004 |
|---|---|
| Distinct (%) | 50.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1766.332461 |
| Minimum | 0.4 |
|---|---|
| Maximum | 65700 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0.4 |
|---|---|
| 5-th percentile | 31.5 |
| Q1 | 79 |
| median | 183 |
| Q3 | 568.5 |
| 95-th percentile | 12725.5 |
| Maximum | 65700 |
| Range | 65699.6 |
| Interquartile range (IQR) | 489.5 |
Descriptive statistics
| Standard deviation | 5520.179564 |
|---|---|
| Coefficient of variation (CV) | 3.12522115 |
| Kurtosis | 31.28977432 |
| Mean | 1766.332461 |
| Median Absolute Deviation (MAD) | 125 |
| Skewness | 5.04661423 |
| Sum | 3516767.93 |
| Variance | 30472382.42 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 183 | 30 | 1.5% |
| 62 | 15 | 0.8% |
| 55 | 15 | 0.8% |
| 59 | 15 | 0.8% |
| 67 | 14 | 0.7% |
| 61 | 14 | 0.7% |
| 49 | 13 | 0.7% |
| 52 | 12 | 0.6% |
| 65 | 12 | 0.6% |
| 80 | 12 | 0.6% |
| Other values (994) | 1839 |
| Value | Count | Frequency (%) |
| 0.4 | 1 | |
| 3.6 | 1 | |
| 3.7 | 2 | |
| 4 | 1 | |
| 4.5 | 1 | |
| 4.6 | 1 | |
| 4.8 | 2 | |
| 5 | 2 | |
| 5.4 | 1 | |
| 5.6 | 1 |
| Value | Count | Frequency (%) |
| 65700 | 1 | |
| 48500 | 1 | |
| 47156 | 1 | |
| 46170 | 1 | |
| 44600 | 1 | |
| 44000 | 1 | |
| 43983 | 1 | |
| 42354 | 1 | |
| 39503 | 1 | |
| 37227 | 1 |
| Distinct | 407 |
|---|---|
| Distinct (%) | 20.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.8311223 |
| Minimum | 0.1 |
|---|---|
| Maximum | 534.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0.1 |
|---|---|
| 5-th percentile | 0.6 |
| Q1 | 1.2 |
| median | 1.8965 |
| Q3 | 3.6 |
| 95-th percentile | 22.15 |
| Maximum | 534.5 |
| Range | 534.4 |
| Interquartile range (IQR) | 2.4 |
Descriptive statistics
| Standard deviation | 29.08989794 |
|---|---|
| Coefficient of variation (CV) | 4.258436119 |
| Kurtosis | 181.0025391 |
| Mean | 6.8311223 |
| Median Absolute Deviation (MAD) | 0.8965 |
| Skewness | 12.3976801 |
| Sum | 13600.7645 |
| Variance | 846.2221622 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.5 | 77 | 3.9% |
| 1 | 74 | 3.7% |
| 1.2 | 72 | 3.6% |
| 1.1 | 69 | 3.5% |
| 1.6 | 65 | 3.3% |
| 1.4 | 63 | 3.2% |
| 0.9 | 63 | 3.2% |
| 1.3 | 62 | 3.1% |
| 0.8 | 57 | 2.9% |
| 1.9 | 55 | 2.8% |
| Other values (397) | 1334 |
| Value | Count | Frequency (%) |
| 0.1 | 1 | 0.1% |
| 0.25 | 1 | 0.1% |
| 0.267 | 1 | 0.1% |
| 0.28 | 1 | 0.1% |
| 0.3 | 5 | 0.3% |
| 0.4 | 19 | |
| 0.414 | 1 | 0.1% |
| 0.425 | 1 | 0.1% |
| 0.458 | 1 | 0.1% |
| 0.467 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 534.5 | 1 | |
| 513.5 | 1 | |
| 441.8 | 1 | |
| 431.5 | 1 | |
| 359 | 1 | |
| 354 | 1 | |
| 341.833 | 1 | |
| 195.4 | 1 | |
| 185.8 | 1 | |
| 185 | 1 |
| Distinct | 506 |
|---|---|
| Distinct (%) | 25.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.497969362 |
| Minimum | 0 |
|---|---|
| Maximum | 108.7 |
| Zeros | 41 |
| Zeros (%) | 2.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.0815 |
| Q1 | 0.28 |
| median | 0.516 |
| Q3 | 1.2 |
| 95-th percentile | 5.08 |
| Maximum | 108.7 |
| Range | 108.7 |
| Interquartile range (IQR) | 0.92 |
Descriptive statistics
| Standard deviation | 3.868221118 |
|---|---|
| Coefficient of variation (CV) | 2.582309903 |
| Kurtosis | 323.9800913 |
| Mean | 1.497969362 |
| Median Absolute Deviation (MAD) | 0.316 |
| Skewness | 13.85574975 |
| Sum | 2982.457 |
| Variance | 14.96313462 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.516 | 225 | 11.3% |
| 0.1 | 77 | 3.9% |
| 0.4 | 55 | 2.8% |
| 1 | 51 | 2.6% |
| 0.2 | 50 | 2.5% |
| 0.3 | 44 | 2.2% |
| 0 | 41 | 2.1% |
| 0.5 | 29 | 1.5% |
| 0.6 | 23 | 1.2% |
| 0.08 | 21 | 1.1% |
| Other values (496) | 1375 |
| Value | Count | Frequency (%) |
| 0 | 41 | |
| 0.01 | 1 | 0.1% |
| 0.02 | 4 | 0.2% |
| 0.03 | 3 | 0.2% |
| 0.04 | 2 | 0.1% |
| 0.05 | 7 | 0.4% |
| 0.06 | 14 | 0.7% |
| 0.07 | 7 | 0.4% |
| 0.08 | 21 | |
| 0.083 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 108.7 | 1 | |
| 58.1 | 1 | |
| 25.71 | 1 | |
| 20.45 | 1 | |
| 20.3 | 1 | |
| 20.2 | 1 | |
| 19.69 | 1 | |
| 19.6 | 1 | |
| 19.4 | 2 | |
| 19.35 | 1 |
| Distinct | 1093 |
|---|---|
| Distinct (%) | 54.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 498335.6188 |
| Minimum | 0 |
|---|---|
| Maximum | 511090873 |
| Zeros | 5 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 118 |
| median | 468 |
| Q3 | 1696.5 |
| 95-th percentile | 38547.5 |
| Maximum | 511090873 |
| Range | 511090873 |
| Interquartile range (IQR) | 1578.5 |
Descriptive statistics
| Standard deviation | 13754731.42 |
|---|---|
| Coefficient of variation (CV) | 27.60134115 |
| Kurtosis | 1076.580903 |
| Mean | 498335.6188 |
| Median Absolute Deviation (MAD) | 409 |
| Skewness | 31.71407152 |
| Sum | 992186217 |
| Variance | 1.891926364 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 468 | 133 | 6.7% |
| 10 | 14 | 0.7% |
| 33 | 12 | 0.6% |
| 45 | 12 | 0.6% |
| 63 | 11 | 0.6% |
| 36 | 10 | 0.5% |
| 41 | 9 | 0.5% |
| 32 | 9 | 0.5% |
| 22 | 9 | 0.5% |
| 350 | 9 | 0.5% |
| Other values (1083) | 1763 |
| Value | Count | Frequency (%) |
| 0 | 5 | 0.3% |
| 2 | 1 | 0.1% |
| 3 | 4 | 0.2% |
| 4 | 3 | 0.2% |
| 5 | 7 | |
| 6 | 7 | |
| 7 | 2 | 0.1% |
| 8 | 6 | |
| 9 | 3 | 0.2% |
| 10 | 14 |
| Value | Count | Frequency (%) |
| 511090873 | 1 | |
| 300000000 | 1 | |
| 160405392 | 1 | |
| 6400000 | 1 | |
| 967500 | 1 | |
| 712500 | 1 | |
| 622500 | 1 | |
| 440750 | 1 | |
| 392500 | 1 | |
| 358000 | 1 |
| Distinct | 257 |
|---|---|
| Distinct (%) | 12.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 77.03411351 |
| Minimum | 19.3 |
|---|---|
| Maximum | 99.8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 KiB |
Quantile statistics
| Minimum | 19.3 |
|---|---|
| 5-th percentile | 50.2 |
| Q1 | 71.74 |
| median | 79.68 |
| Q3 | 87.66 |
| 95-th percentile | 93.64 |
| Maximum | 99.8 |
| Range | 80.5 |
| Interquartile range (IQR) | 15.92 |
Descriptive statistics
| Standard deviation | 12.85271857 |
|---|---|
| Coefficient of variation (CV) | 0.1668445054 |
| Kurtosis | 1.824096664 |
| Mean | 77.03411351 |
| Median Absolute Deviation (MAD) | 7.94 |
| Skewness | -1.286896152 |
| Sum | 153374.92 |
| Variance | 165.1923746 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 87.66 | 90 | 4.5% |
| 88.38 | 82 | 4.1% |
| 82.94 | 76 | 3.8% |
| 82.04 | 67 | 3.4% |
| 83.7 | 65 | 3.3% |
| 66.44 | 59 | 3.0% |
| 88.2 | 57 | 2.9% |
| 82.76 | 50 | 2.5% |
| 88.56 | 45 | 2.3% |
| 94.18 | 44 | 2.2% |
| Other values (247) | 1356 |
| Value | Count | Frequency (%) |
| 19.3 | 1 | 0.1% |
| 21.52 | 1 | 0.1% |
| 23.44 | 1 | 0.1% |
| 28.12 | 1 | 0.1% |
| 28.66 | 1 | 0.1% |
| 28.66 | 1 | 0.1% |
| 30.04 | 2 | 0.1% |
| 30.54 | 1 | 0.1% |
| 32.78 | 1 | 0.1% |
| 33.34 | 15 |
| Value | Count | Frequency (%) |
| 99.8 | 1 | 0.1% |
| 99.62 | 3 | 0.2% |
| 99.44 | 3 | 0.2% |
| 98.9 | 1 | 0.1% |
| 94.76 | 1 | 0.1% |
| 94.22 | 4 | 0.2% |
| 94.18 | 44 | |
| 94 | 19 | |
| 93.82 | 23 | |
| 93.64 | 12 | 0.6% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Unnamed: 0 | station | do | ph | co | bod | na | tc | wqi | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1393.0 | 6.7 | 7.5 | 203.0 | 1.8965 | 0.1 | 27.0 | 93.82 |
| 1 | 1 | 1399.0 | 5.7 | 7.2 | 189.0 | 2.0000 | 0.2 | 8391.0 | 76.96 |
| 2 | 2 | 1475.0 | 6.3 | 6.9 | 179.0 | 1.7000 | 0.1 | 5330.0 | 79.28 |
| 3 | 3 | 3181.0 | 5.8 | 6.9 | 64.0 | 3.8000 | 0.5 | 8443.0 | 69.34 |
| 4 | 4 | 3182.0 | 5.8 | 7.3 | 83.0 | 1.9000 | 0.4 | 5500.0 | 77.14 |
| 5 | 5 | 1400.0 | 5.5 | 7.4 | 81.0 | 1.5000 | 0.1 | 4049.0 | 77.14 |
| 6 | 6 | 1476.0 | 6.1 | 6.7 | 308.0 | 1.4000 | 0.3 | 5672.0 | 75.44 |
| 7 | 7 | 3185.0 | 6.4 | 6.7 | 414.0 | 1.0000 | 0.2 | 9423.0 | 75.44 |
| 8 | 8 | 3186.0 | 6.4 | 7.6 | 305.0 | 2.2000 | 0.1 | 4990.0 | 82.04 |
| 9 | 9 | 3187.0 | 6.3 | 7.6 | 77.0 | 2.3000 | 0.1 | 4301.0 | 82.76 |
Last rows
| Unnamed: 0 | station | do | ph | co | bod | na | tc | wqi | |
|---|---|---|---|---|---|---|---|---|---|
| 1981 | 1981 | 1160.0 | 7.3 | 178.0 | 6.7 | 1.5 | 0.138 | 190.0 | 72.06 |
| 1982 | 1982 | 1161.0 | 7.1 | 214.0 | 6.8 | 2.3 | 0.585 | 350.0 | 72.06 |
| 1983 | 1983 | 1162.0 | 7.5 | 293.0 | 7.2 | 1.2 | 0.568 | 35.0 | 77.68 |
| 1984 | 1984 | 1328.0 | 6.9 | 146.0 | 7.1 | 2.0 | 0.506 | 38.0 | 77.68 |
| 1985 | 1985 | 1329.0 | 7.0 | 136.0 | 7.5 | 1.4 | 0.609 | 205.0 | 72.06 |
| 1986 | 1986 | 1330.0 | 7.9 | 738.0 | 7.2 | 2.7 | 0.518 | 202.0 | 72.06 |
| 1987 | 1987 | 1450.0 | 7.5 | 585.0 | 6.3 | 2.6 | 0.155 | 315.0 | 72.06 |
| 1988 | 1988 | 1403.0 | 7.6 | 98.0 | 6.2 | 1.2 | 0.516 | 570.0 | 66.44 |
| 1989 | 1989 | 1404.0 | 7.7 | 91.0 | 6.5 | 1.3 | 0.516 | 562.0 | 66.44 |
| 1990 | 1990 | 1726.0 | 7.6 | 110.0 | 5.7 | 1.1 | 0.516 | 546.0 | 66.44 |